A Random Walks Method for Text Classification
نویسندگان
چکیده
Practical text classification system should be able to utilize information from both expensive labelled documents and large volumes of cheap unlabelled documents. It should also easily deal with newly input samples. In this paper, we propose a random walks method for text classification, in which the classification problem is formulated as solving the absorption probabilities of Markov random walks on a weighted graph. Then the Laplacian operator for asymmetric graphs is derived and utilized for asymmetric transition matrix. We also develop an induction algorithm for the newly input documents based on the random walks method. Meanwhile, to make full use of text information, a difference measure for text data based on language model and KL-divergence is proposed, as well as a new smoothing technique for it. Finally an algorithm for elimination of ambiguous states is proposed to address the problem of noisy data. Experiments on two well-known data sets: WebKB and 20Newsgroup demonstrate the effectivity of the proposed random walks method.
منابع مشابه
Text Classification by Markov Random Walks with Reward
We propose a novel model for semisupervised classification by bringing in reward in Markov random walks. Both angle and distance metrics for vectors are combined in this model. Taking advantage of absorbing states, transient analysis of Markov chain can be performed more easily, based on Markov random walks. Diffusion of unlabeled data points makes our approach suffer less from error propagatio...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملIdentifying Text Polarity Using Random Walks
Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing...
متن کاملClassification using random walks with binary features
We present a new algorithm for classification based on Markov random walks. We evaluate our method, TUMBL, on the ppattach prepositional phrase attachment data set and report top performance when the amount of training data is severely limited.
متن کاملA PRELUDE TO THE THEORY OF RANDOM WALKS IN RANDOM ENVIRONMENTS
A random walk on a lattice is one of the most fundamental models in probability theory. When the random walk is inhomogenous and its inhomogeniety comes from an ergodic stationary process, the walk is called a random walk in a random environment (RWRE). The basic questions such as the law of large numbers (LLN), the central limit theorem (CLT), and the large deviation principle (LDP) are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006